Estimation of influential points in any data set from coefficient of determination and its leave-one-out cross-validated counterpart

نویسندگان

  • Gergely Tóth
  • Zsolt Bodai
  • Károly Héberger
چکیده

Coefficient of determination (R (2)) and its leave-one-out cross-validated analogue (denoted by Q (2) or R cv (2) ) are the most frequantly published values to characterize the predictive performance of models. In this article we use R (2) and Q (2) in a reversed aspect to determine uncommon points, i.e. influential points in any data sets. The term (1 - Q (2))/(1 - R (2)) corresponds to the ratio of predictive residual sum of squares and the residual sum of squares. The ratio correlates to the number of influential points in experimental and random data sets. We propose an (approximate) F test on (1 - Q (2))/(1 - R (2)) term to quickly pre-estimate the presence of influential points in training sets of models. The test is founded upon the routinely calculated Q (2) and R (2) values and warns the model builders to verify the training set, to perform influence analysis or even to change to robust modeling.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prediction of the adsorption capability onto activated carbon of liquid aliphatic alcohols using molecular fragments method

Quantitative structure-property relationship (QSPR) for estimating the adsorption of aliphatic alcohols onto activated carbon were developed using substructural molecular fragments (SMF) method. The adsorption capacity of activated carbon (gr/100grC) for 150 aliphatic alcohols onto activated carbon (AC) is studied under equilibrium conditions. Forward and backwards stepwise regression variable ...

متن کامل

QSRR Study of Organic Dyes by Multiple Linear Regression Method Based on Genetic Algorithm (GA–MLR

Quantitative structure-retention relationships (QSRRs) are used to correlate paper chromatographic retention factors of disperse dyes with theoretical molecular descriptors. A data set of 23 compounds with known RF values was used. The genetic algorithm-multiple linear regression analysis (GA-MLR) with three selected theoretical descriptors was obtained. The stability and predictability of the ...

متن کامل

Determination of the attenuation coefficient for megavoltage photons in the water phantom

Background: Attenuation coefficient (μ) plays an important role in calculations of treatment planning systems, as well as determination of dose distributions in external beam therapy, dosimetry, protection, phantom materials and industry. So, its exact measurement or calculation is very important. The aim of this study was to evaluate the μ in different points in the water phantom anal...

متن کامل

Determination of genetic uniformity in transgenic cotton plants using DNA markers (RAPD and ISSR) and SDS-PAGE

One concern about using transgenic plants is the genetic variation that occurred from theirs tissue culture and regeneration. Molecular markers are an important element for efficient and effective determination of genetic variation. The present work was carried out to assess the genetic uniformity of transgenic cottons (Bt and chitinase lines), using RAPD, ISSR molecular markers and SDS-PAGE an...

متن کامل

Quantification of Polyethylene Glycol Esters of Methotrexate and Determination of Their Partition Coefficients by Validated HPLC Methods

Conjugation of methotrexate (MTX) (MW 454) with different molecular weights of polyethylene glycol (PEG) including methoxy-peg (mpeg) 750 D and 5000 D and diol-peg 35000 D led to compounds that are physicochemically highly different from the parent compound, MTX. In this study, an HPLC system consisting of C8 column and UV detector (?=342 nm), using a mixture of 30:70 v/v phosphate-citrate buff...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of computer-aided molecular design

دوره 27 10  شماره 

صفحات  -

تاریخ انتشار 2013